Goto

Collaborating Authors

 kernel k-means


Statistical and Computational Trade-Offs in Kernel K-Means

Neural Information Processing Systems

We investigate the efficiency of k-means in terms of both statistical and computational requirements. More precisely, we study a Nystr\om approach to kernel k-means. We analyze the statistical properties of the proposed method and show that it achieves the same accuracy of exact kernel k-means with only a fraction of computations. Indeed, we prove under basic assumptions that sampling $\sqrt{n}$ Nystr\om landmarks allows to greatly reduce computational costs without incurring in any loss of accuracy. To the best of our knowledge this is the first result showing in this kind for unsupervised learning.



Statistical and Computational Trade-Offs in Kernel K-Means

Daniele Calandriello, Lorenzo Rosasco

Neural Information Processing Systems

More precisely, we study a Nyström approach to kernel k-means. Weanalyze thestatistical properties oftheproposed method andshow that it achieves the same accuracy of exact kernel k-means with only a fraction of computations.


Means

Neural Information Processing Systems

InBiauetal.(2008),theyemploy the randomized sketches method to project the data in Hilbert space so as to approximate kernel k-means. However, the data in Hilbert space are implicit and infinite-dimensional, and its sketch matrixisdenseandunstructured.


Means

Neural Information Processing Systems

InBiauetal.(2008),theyemploy the randomized sketches method to project the data in Hilbert space so as to approximate kernel k-means. However, the data in Hilbert space are implicit and infinite-dimensional, and its sketch matrixisdenseandunstructured.


Fair Kernel K-Means: from Single Kernel to Multiple Kernel

Neural Information Processing Systems

Kernel k-means has been widely studied in machine learning. However, existing kernel k-means methods often ignore the \textit{fairness} issue, which may cause discrimination. To address this issue, in this paper, we propose a novel Fair Kernel K-Means (FKKM) framework. In this framework, we first propose a new fairness regularization term that can lead to a fair partition of data. The carefully designed fairness regularization term has a similar form to the kernel k-means which can be seamlessly integrated into the kernel k-means framework. Then, we extend this method to the multiple kernel setting, leading to a Fair Multiple Kernel K-Means (FMKKM) method. We also provide some theoretical analysis of the generalization error bound, and based on this bound we give a strategy to set the hyper-parameter, which makes the proposed methods easy to use. At last, we conduct extensive experiments on both the single kernel and multiple kernel settings to compare the proposed methods with state-of-the-art methods to demonstrate their effectiveness.


Statistical and Computational Trade-Offs in Kernel K-Means

Neural Information Processing Systems

We investigate the efficiency of k-means in terms of both statistical and computational requirements. More precisely, we study a Nystr\om approach to kernel k-means. We analyze the statistical properties of the proposed method and show that it achieves the same accuracy of exact kernel k-means with only a fraction of computations. Indeed, we prove under basic assumptions that sampling $\sqrt{n}$ Nystr\om landmarks allows to greatly reduce computational costs without incurring in any loss of accuracy. To the best of our knowledge this is the first result showing in this kind for unsupervised learning.




Kernel K-means clustering of distributional data

Baíllo, Amparo, Berrendero, Jose R., Sánchez-Signorini, Martín

arXiv.org Machine Learning

We consider the problem of clustering a sample of probability distributions from a random distribution on $\mathbb R^p$. Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing kernel Hilbert space (RKHS) $\mathcal H$. By mapping each distribution to its corresponding kernel mean embedding in $\mathcal H$, we obtain a sample in this RKHS where we carry out the $K$-means clustering procedure, which provides an unsupervised classification of the original sample. The procedure is simple and computationally feasible even for dimension $p>1$. The simulation studies provide insight into the choice of the kernel and its tuning parameter. The performance of the proposed clustering procedure is illustrated on a collection of Synthetic Aperture Radar (SAR) images.